Methods and systems for process rollback in a shared memory parallel processor computing environment

ABSTRACT

Methods and systems for process rollback in a shared memory parallel processor computing environment use priority values to control process rollback. Process classes are defined and each process class is allocated a base priority value. Each process run by the system is associated with one of the classes. In accordance with a first embodiment, process priorities determine which process is rolled back. In accordance with a second embodiment, collision counts and class pair priorities determine which process is rolled back. The methods and systems ensure that critical processes are granted at least a minimum allocation of processor time, while less critical processes are not completely starved. The functionality the system is thereby improved.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is the first application filed for the present invention.

MICROFICHE APPENDIX

Not applicable.

TECHNICAL FIELD

This invention relates in general to shared memory systems for use inparallel processing environments and, in particular to methods andsystems for process rollback in a shared memory parallel processorenvironment.

BACKGROUND OF THE INVENTION

The rapid growth in the Public Switched Telephone Network (PSTN),especially the rapid expansion of service features has strained theprocessing capacity of incumbent switching equipment. This isparticularly the case in wireless telephony environments where messagingloads between mobile switching centres are intense. As is well known,most incumbent switching systems in the PSTN have processingarchitectures that are based on a single central control component thatis responsible for all top level processing in the system. Such singlecentral control component architectures provide the advantage toapplication programmers of some simplification with respect to resourcecontrol, flow control and inter-process communication. However, singlecentral control component architectures are subject to seriousbottlenecks due principally to the fact that each process is dependenton the capacity of the single core processor. There has therefore beenan acute interest in developing parallel processor control for incumbentswitching systems to improve performance and permit the addition of newprocessor-intensive service features.

Parallel processor architectures are well known. However, the softwarewritten for such architectures is specifically designed to avoidprocessor conflicts while accessing shared resources such as sharedmemory. This is accomplished by providing exclusive access to thememories using software semaphores or methods for locking memory accessbuses, and the like. However, incumbent switching systems in the PSTNwere typically written for a central control component, and in manycases it is not economically feasible to rewrite the application codefor a parallel processor architecture. Aside from the complexity of sucha rewrite, the time and cost incurred to complete such a task isgenerally considered to be prohibitive.

It is known in the art that when a shared memory parallel processorcomputing environment is used to execute code written for a singlecentral control component, two processes can compete for a memory spacein the shared memory. This competition is called blocking. Becauserights to a memory space cannot be granted to more than one process at atime, one process must be “rolled back” while the other process ispermitted to continue execution.

A shared memory control algorithm for mutual exclusion and rollback isdescribed in U.S. Pat. No. 5,918,248, which issued on Jun. 29, 1999 tothe Assignee. The patent describes a mechanism for permitting a sharedmemory single central control component parallel processing architectureto be used in place of a conventional system, without requiring codewritten for the conventional system to be rewritten. Exclusive Accessand Shared Lead Access implementations are disclosed. A rollbackmechanism is provided which permits all the actions of a task inprogress to be undone. The memory locations of that parallel processorarchitecture include standard locations and shared read locations. Anytask is granted read access to a shared read location, but only a singletask is granted write access to a shared read location at any giventime.

A prior art rollback mechanism designed by the Assignee uses threepriority levels (0, 1 and 2). When two processes compete for the samememory space, the process with the higher priority is permitted tocontinue execution and the process with the lower priority is rolledback. Initially, each process is assigned a default priority value ofzero. When two processes having zero priority compete for a same memoryspace, the processes are executed on a first-in-first-out basis. Theprocess that is rolled back then has its priority set at 1. If the sameprocess is rolled back a second time, due to competition with anotherpriority 1 process, the priority of the process is set at 2, which isthe highest priority permitted. The scheduler ensures that only onepriority 2 process is allowed to execute on the system at any one time.After the process has reached a commit point, the priority associatedwith the process is reset to zero.

While this algorithm represents a significant advance in the art, therollback mechanism has not proven to support optimal performance.Performance is compromised for the principal reason that processesbelonging to large classes are rolled back too often to meet their CPUtime requirement.

It is therefore highly desirable to provide a method and system forrolling back processes in a shared memory, parallel processor computingenvironment that enhances performance by ensuring that access tocomputing resources is optimized.

SUMMARY OF THE INVENTION

It is therefore an object of the invention to provide methods andsystems for process rollback in a shared memory, parallel processorcomputing environment that enhances performance by ensuring thatprocesses are rolled back in proportion to their allotted processingtime.

In accordance with a first embodiment of the invention, there isprovided a method for process rollback in a shared-memoryparallel-processor computing environment in which the parallelprocessors are operated concurrently and each processor sequentiallyruns processes. In accordance with the method, when two processescompete for a memory space in the shared memory, one of the processes isrolled back. The process that is rolled back is the process that has alower priority value, or if the two processes have the same priorityvalue, the process that collided with an owner of the memory space isrolled back. A process collides with the owner of the memory space if itattempts to access the memory space when the owner has possession of thememory space.

When a process is rolled back, a new priority value is computed for therolled-back process. The new priority value is computed for therolled-back process by incrementing the processes priority value by apredetermined amount. If the two processes are members of differentclasses, the predetermined amount is preferably a priority valueassigned to a class of which the rolled-back process is a member. If thetwo processes are members of the same class, the predetermined amount ispreferably less than the priority value assigned to the class. When aprocess reaches a commit point, the priority value of the process isreset to a priority value assigned to the class of which the process isa member.

The priority value assigned to the class is preferably related to aproportion of processor time allocated to the class. The priority valuemay be directly proportional to the processor time allocated to theclass, for example. In accordance with the first embodiment of theinvention, the priority value of each process is stored in a processcontrol block associated with the process.

The invention also provides a shared memory parallel processor systemfor executing processes concurrently, comprising means for storing apriority value associated with each process; means for determining whichone of two processes is to be rolled back using the priority valuesassociated with each of the two processes when the two processes competefor a memory space; and, means for computing a new priority value forthe process that is rolled back.

The means for determining which process is to be rolled back, comprisesmeans for selecting the process that has a lower priority value, whenthe two processes have different priority values; and, means forselecting the process that collided with an owner of the memory space,when the two processes have the same priority value. When a collisionoccurs because two processes compete for a memory space, the systemdetermines a class of which each process is a member. The system furthercomprises means for computing a new priority value for the rolled-backprocess, by incrementing the priority value by a predetermined amount.The predetermined amount is a first amount if the processes belong todifferent classes, and a second amount if the processes belong to thesame class. The first amount is preferably a priority value associatedwith a process class of which the process is a member.

In accordance with a second embodiment of the invention, there isprovided a method for process rollback in a shared memory parallelprocessor computing environment in which the processors run processesconcurrently, and each process is a member of one of a plurality ofprocess classes. The method comprises steps of maintaining a pair ofvariables for each pair of process classes, the variables storing acurrent priority value for each process class in each process classpair. When two processes that are members of different process classescompete for a same memory space, a collision count stored in arespective process control block of each process is examined todetermine whether either collision count exceeds a first engineeredcollision threshold. If either collision count exceeds the firstcollision threshold, the process with the lowest collision count isrolled back and the other process is permitted to continue execution. Ifneither collision count exceeds the threshold, current priority valuesof the respective class pair are used to determine which process isrolled back. The current priority values are compared and the processthat is rolled back is one of: a) the process that is a member of theclass that has a lower current priority value; and, b) if the twoclasses have the same current priority value, the process that collidedwith an owner of the memory space.

When a process is rolled back a new priority value is stored in thevariable for the class priority of which the rolled-back process was amember. A new priority value is stored in the variable by incrementingthe variable by an amount equal to a base priority value stored in aprocess class parameter file.

The collision count for each process is stored in a process controlblock associated with each of the respective processes. The collisioncount associated with the rolled-back process is incremented each timethe process is rolled back. When a collision count associated with aprocess exceeds an engineered second collision threshold, the process isrun without competition until it commits, and the collision count isreset to zero.

When two processes that are members of the same class compete for thesame memory space, the process that collided with an owner of the memoryspace is rolled back and a collision count associated with therolled-back process is incremented. Each time a process is scheduled torun, a value of the collision count is compared with the secondcollision threshold, and if the collision count exceeds the secondcollision threshold, the process is permitted to run withoutcompetition. The collision count is reset to zero after the processreaches a commit point.

The invention further provides a shared-memory parallel-processorcomputing apparatus in which the processors run processes concurrently,and each process is a member of one of a plurality of process classes.The apparatus comprises means for storing a pair of variables for eachpair of process classes, the variables storing a variable priority valuefor each process class in each process class pair. The apparatus furthercomprises means for determining, using the respective priority values,which process is rolled back when two processes that are members ofdifferent process classes compete for a memory space. The system alsocomprises means for computing and storing a collision count associatedwith each of the processes. The means for computing and storing thecollision count preferably stores the collision count in a processcontrol block associated with each of the respective processes. Themeans for computing and storing the collision count increments thecollision count associated with the rolled-back process when a processis rolled back.

The means for determining which process is rolled back selects one of:the process with the lowest collision count if the collision countassociated with either process exceeds a first collision countthreshold, and if neither collision count exceeds the first threshold,the process that is a member of the class that has a lower priorityvalue unless the two classes have the same priority value, in whichcase, the process that collided with an owner of the memory space isrolled back.

The system further comprises means for storing a new priority value inthe variable for the class of which the rolled-back process is a member.The means for storing a new priority value in the variable adds, to thevalue of the variable, an amount equal to a base priority value storedin a process class parameter file.

The invention therefore provides a parallel processor/shared memorycomputing system that ensures that critical processes are guaranteedadequate processor time, while also ensuring that less criticalprocesses are not completely starved. Process execution is dynamicallyadjusted to ensure equitability of access to computing resources. Asystem controlled by the methods in accordance with the invention istherefore ensured of more stable operation, and functionality isimproved.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the invention will now be described withreference to the attached drawings in which:

FIG. 1 is a schematic diagram of a parallel processor shared memorysystem that is known in the art;

FIG. 2 is a schematic diagram illustrating a process control block inaccordance with one embodiment of the invention;

FIG. 3 is a flow diagram illustrating a method of rollback in accordancewith a first embodiment of the invention;

FIG. 4 is a flow diagram illustrating a method in accordance with thefirst embodiment of the invention for ensuring that processes that arerepeatedly rolled back have an opportunity to commit;

FIG. 5, which appears on sheet two of the drawings, is a schematicdiagram illustrating a process control block in accordance with a secondembodiment of the invention;

FIG. 6, which also appears on sheet two of the drawings, is a schematicdiagram illustrating a class pairs priority table in accordance thesecond embodiment of the invention;

FIG. 7 is a flow diagram illustrating a method of rollback in accordancewith the second embodiment of the invention;

FIG. 8 is a flow diagram illustrating a method in accordance with thesecond embodiment of the invention for ensuring that processes that arerepeatedly rolled back have an opportunity to commit; and

FIGS. 9A and 9B are tables illustrating process rollback in accordancewith the second embodiment of the invention.

It will be noted that throughout the appended drawings, like featuresare identified by like reference numerals.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

This invention provides methods and systems for rollback in a parallelprocessor/shared memory computing environment. In accordance with theinvention, a plurality of process classes are respectively allocated abase priority value. Each process run by the processors is associatedwith one of the classes so that each process inherits an initialpriority value from the class of which the process is a member. Theinitial priority value inherited from the class is a base priority valueallocated to the class. The base priority value is stored, for example,in a class parameter file. Preferably, the base priority value isdirectly proportional to a processor time share allocation granted tothe class of processes.

If two processes compete for a memory space in the common memory whilebeing run, one of the processes is rolled back and the other process ispermitted to continue execution. In accordance with the first embodimentof the invention, the process that is rolled back is the process havinga lower priority value, or, if the processes have the same priorityvalue, the process that is rolled back is the process that is the“collider”. When a process accesses a memory space, it becomes “owner”of the memory space until it has “committed”. If a memory space isowned, and another process attempts to access that memory space, theother process is a “collider”. When a process is rolled back, thepriority value of the process may be incremented to increase theprobability that the process will complete next time it is run. Theamount by which the priority value is incremented depends upon thecircumstances associated with the rollback and the method of rollbackcontrol used. The methods and apparatus in accordance with the inventionensure that critical tasks run to completion without starving lesscritical processes of an opportunity to commit.

FIG. 1 is a schematic diagram of a parallel processor/shared memorysystem 10, which is well known in the prior art. The system 10 includesa plurality of processors 12 (only four of which are shown) connectedthrough an interconnect 14 to a main shared memory 16 made up of one ormore memory modules (only one module is shown), in a manner well knownin the art. One or more IO devices 18 are also connected to theinterconnect 14. Such systems 10 are commonly equipped with redundanthardware components to increase fault tolerance, but redundancy is notessential to the invention and the redundant elements are notillustrated. Each processor 12 has a respective cache memory 20, that isalso well known in the art. Each memory module 16 includes a data memoryand memory ownership control functionality, commonly implemented infirmware. For ownership purposes, the data memory in each memory module16 is divided into segments of memory referred to as “cache lines” or“memory lines”, as is well known in the art.

As is well understood in the art and explained in Assignee's U.S. Pat.No. 5,918,248 which issued on Jun. 29, 1999, the specification of whichis incorporated herein by reference, special arrangements must be madeto ensure that processes running on the system 10 will operate withoutbeing affected by other processes running at the same time. In aparallel processor computing environment, two processes runningconcurrently may compete for a same memory space in the memory modules16. If this occurs, actions of one process may affect the integrity ofdata required by actions of the competing process. In order to avoid anunstable state in which data integrity is lost to one of the processes,a mechanism referred to as “rollback” has been adopted. Rollbackeliminates the competition by granting exclusive access to the memory toone of the processes, while the other process is rolled back andreturned to a process queue, and all operations that the processperformed prior to rollback are undone.

In order to ensure that all processes are granted their fair share ofprocessor time, some mechanism is required for controlling whichprocesses are rolled back. In accordance with a first embodiment of theinvention, rollback control is accomplished using a per-process prioritytracking and control mechanism. The per-process priority values aretracked using a process control block 22 shown in FIG. 2. As is wellknown in the art, a process control block is associated with eachprocess that is instantiated by the parallel processing system 10. Theprocess control block 22 stores information required by the system 10 torun and control a process. In accordance with the invention, the processcontrol block includes a process priority field 24 used to store acurrent priority value for each process run by the shared memoryparallel processor system 10.

FIG. 3 is a flow diagram illustrating rollback control in accordancewith the first embodiment of the invention. The system 10 (FIG. 1)continually monitors process execution to determine when an executingprocess has run to a commit point (step 30). If a process is determinedto have run to a commit point, the priority value of the process isreset in step 31, and the process execution ends (step 32). The processpriority value is reset to an engineered value (the class base priorityvalue, for example). Thus, each time a process commits, the priorityvalue 24 (FIG. 2) is reset to ensure that the process competes forshared memory space at the engineered priority level next time theprocess is run.

The system 10 also continuously monitors to determine if a blockingcondition (collision) exists (step 33). A blocking condition is detectedwhen hardware sends an interrupt if an identification of a processattempting to access a memory space does not match the identification ofthe process that currently “owns” the memory space. If a blockingcondition exists, the system 10 determines a priority of the owner andthe collider processes (step 34). The priorities are compared in step35. If the priorities are different, the process with the lowestpriority is rolled back in step 36. If the priorities are the same, theprocess that caused the collision (the collider) is rolled back in step38. After it is determined which process is to be rolled back, the classto which each process belongs is compared in step 40. If the classes aredifferent, the priority value 24 (FIG. 2) of the rolled back process isincremented using a first value. The first value is preferably the basepriority value assigned to the class of which the rolled-back process isa member. If it is determined in step 40 that the classes are the same,the priority of the rolled-back process is incremented using a secondvalue, which is preferably significantly less than the first value.After the priority values are incremented in one of steps 42, 44, theprocess that was rolled back returns to the beginning where the system10 continues monitoring the process that continues to run, as well asany other processes that are running at the time.

FIG. 4 is a flow diagram illustrating a method in accordance with thefirst embodiment of the invention for ensuring that any particularprocess is not completely deprived of an opportunity to execute tocompletion (commit). In accordance with the method, whenever a processis scheduled by a scheduler (not illustrated) of the system 10, thepriority value 24 (FIG. 2) in the process control block 22 is examinedto determine whether the priority value has exceeded an engineeredpriority threshold. The engineered priority threshold is preferably alarge value, for example, near a maximum expressible value of thepriority value field 24. However, the level of the priority threshold isa matter of design choice. If the process priority value is not greaterthan the engineered priority threshold, the process is scheduled in anormal way (step 52), and control is transferred to the beginning of theprocess described above with reference to FIG. 3. If, however, it isdetermined in step 50 that the process priority value is greater thanthe engineered priority threshold, the priority values of processescurrently running, or waiting in a run queue (not shown) to be run, isexamined (step 54) to ensure that no other process with a same or ahigher priority value is scheduled to run at the same time. This ensuresthat the process will run to a commit point. If no processes with anexcess priority value are running or are queued to run, the process isscheduled to process in the normal way. If there are other potentiallycolliding processes, the process is queued to run with a high priorityin step 56. The run queue and the executing processes are then monitored(step 58) to determine when the process is clear to run, i.e. when thereare no other executing or queued processes with the same or higherpriority value. When the process is cleared to run, it is scheduled inthe normal way in step 52, and control is returned to the processdescribed above with reference to FIG. 3. Thus, every process which, bycircumstance, is repeatedly rolled back is guaranteed an opportunity tocomplete after the processes priority exceeds the engineered prioritythreshold.

FIG. 5 illustrates a process control block 22 in accordance with asecond embodiment of the invention. The process control block 22 storesa collision count 26 rather than the process priority value 24 shown inFIG. 2. The collision count 26 is used to track the number of times thata given process collides with another process and is rolled back. Aswill be explained below in more detail, the collision count 26 is usedin much the same way as the priority value 24 to ensure that everyprocess has an opportunity to complete if, by chance, it is repeatedlyrolled back before it is permitted to commit.

FIG. 6 is a schematic diagram of a class pairs priority table 60 alsoused by the system 10 to implement the method in accordance with thesecond embodiment of the invention. The class pairs priority table 60 isused in conjunction with the collision count 26 to determine whichprocess will be rolled back when a collision occurs. The table 60 isused to track priority values associated with process class pairs. Thus,in accordance with the second embodiment of the invention, priorityvalues are associated with process class pairs, as opposed to individualprocesses as described above with reference to the first embodiment ofthe invention. Although the table shown in FIG. 6 illustrates only threeprocess classes (Class A, Class B and Class C), it will be understood bythose skilled in the art that the table 60 includes one row and onecolumn for each process class defined for the system 10.

As shown in FIG. 6, there is a row 62 of process class identifiers thatdesignates an owner of a memory space when a collision occurs. Column 64contains process class identifiers associated with a colliding process.For the sake of illustration, base priority values 66 associated witheach class are shown in the table 60. The base priority values 66 neednot be stored in an actual class pairs priority table 60. As will beexplained below, collisions that occur between members of the same classare treated differently than collisions that occur between members ofdifferent classes. Consequently, the class pairs priority table shown inFIG. 6 only stores a variable 68 for a collider class priority value anda variable 69 for an owner class priority value when the two classes inthe pair are different. As will be explained below with reference toFIG. 7, the variable 68 that stores the collider priority value, thevariable 69 that stores the owner priority value, and the collisioncount 26 (FIG. 5) are used to determine which process is rolled backwhen two processes collide.

FIG. 7 is a flow diagram that illustrates the process of rollbackcontrol in accordance with a second embodiment of the invention. In step70 it is determined whether a process has run to a commit point. If theprocess has run to a commit point, the process collision count 26 (FIG.5) in the process control block is reset (step 71) to an engineeredvalue, zero for example. If the process has not committed, it isdetermined in step 72 whether two processes are blocking, as describedabove with reference to FIG. 3. If two processes are determined to be ina blocking condition, the respective collision counts 26 are examined todetermine whether either collision count 26 exceeds a first engineeredthreshold (step 73). The first collision count threshold is engineeredto ensure fairness in a scheduling process. If either collision count 26exceeds the first threshold, the process with the lowest collision countis rolled back (step 74). Thereafter, the collision count of therolled-back process is incremented in step 92, and the class priority ofthe class of which the rolled-back process is a member is incremented instep 94, and the process resumes at step 70.

If it is determined in step 72 that neither collision count 26 exceedsthe first threshold, the class to which each of the processes belongs isdetermined in step 80. In step 82, the classes are compared to determineif they are different. If the classes are determined to be different, instep 84 the class priority variables 68,69 are retrieved from the classpairs table 60 (FIG. 6). In step 86, the class priority variables 68,69are compared to determine if they are different. If the class prioritiesare different, the process with the lowest class priority is rolled backin step 88. If the priorities of the two classes are the same, thecollider process is rolled back in step 90. In either case, thecollision count of the rolled-back process is incremented in step 92.Likewise, if it is determined in step 82 that the classes are the same,the collider process is rolled back in step 90. The collision count ofthe rolled-back process is incremented in step 92, and the classpriority of the class to which the rolled-back process is a member isincremented in step 94. Thereafter the process resumes at step 70.

FIG. 8 is a flow diagram illustrating a method in accordance with thesecond embodiment of the invention for ensuring that any particularprocess is not completely deprived of any opportunity to commit. Inaccordance with the method, whenever a process is scheduled by thescheduler (not shown) of the system 10, the collision count 26 (FIG. 5)in the process control block 22 is examined. In step 140, it isdetermined whether the collision count 26 associated with the process isgreater than a second engineered collision count threshold, which ishigher than the first collision count threshold described above withreference to FIG. 7. If the count is not greater than the secondcollision count threshold, the process is scheduled in the normal way(step 142) and control is passed to the beginning of the processdescribed above with reference to FIG. 7. If the collision count 26 isgreater than the second engineered collision count threshold, thecollision counts of processes currently running, or already scheduled tobe run, are checked in step 144. If no other process with an excessivecollision count is running or scheduled to be run, the process isscheduled normally in step 142. Otherwise, the process is queued withhigh priority (step 146). Currently running processes and processesalready scheduled to be run are then monitored in step 148 until noother process with an excessive collision count is running or scheduledto be run. Thereafter, the process is scheduled in the normal way (step142). This ensures that the process will run to a commit point. Thus,every process which, by circumstance, is repeatedly rolled back isguaranteed an opportunity to run to a commit point after the collisioncount exceeds the second engineered collision count threshold.

It should be noted that the class pairs priorities stored in the classpairs priority variables 68,69 are preferably reset to engineered valuesafter one of the variables 68,69 becomes greater than an engineeredclass pairs priority threshold. The size of the class pairs prioritythreshold is not important, because the comparison (step 86 of FIG. 7)is a relative test. A reset is preferably performed, however, to preventthe variables 68,69 from causing numeric overflow. Other methods ofpreventing numeric overflow of the variables can also be implemented.

FIGS. 9A and 9B are tables that further illustrate the method inaccordance with the second embodiment of the invention. As shown in FIG.9A, the priority values associated with a class pair change ascollisions occur between processes that are members of the class pair.FIG. 9A also shows the class to which the process belongs that is rolledback when a collision occurs. In the example shown in FIG. 9A, processesthat are members of Class A are the owners of the memory space when thecollisions occur, and processes that are members of Class B, are thecolliders. As shown at row 92, each of Classes A and B are initializedto their base priority values of 5 and 2, respectively. When an initialcollision occurs, the process that is a member of Class B is rolledback, as indicated at 132. When a second collision occurs (row 94) theclass priority value associated with Class B has been increased to 4,which is still less than the class priority value associated with ClassA. Consequently, the process that is a member of Class B is rolled back.After rollback, the priority value of Class B is incremented to 6 (row96), and when the next collision occurs, the process that is a member ofClass A is rolled back because the priority value of Class A is lessthan that of Class B. When the process that is a member of Class A isrolled back, the priority value of Class A is incremented by the classbase and the priority value becomes 10 (row 98). When a next collisionoccurs, the process that is a member of Class B is therefore rolledback. The same is true for row 100 where the process that is a member ofClass B is rolled back because Class B has a priority value of 8,whereas Class A has a priority value of 10. At row 102, the classpriority values are equal and the process that is a member of Class B isrolled back because the process that is a member of Class A is owner ofthe memory space, as explained above with reference to FIG. 7. When theprocess of Class B is rolled back, the priority value of Class B in theclass pairs priority table 60 is incremented to 12. Consequently, in row104, when a collision occurs the process that is a member of Class A isrolled back because the priority value of Class A is less than that ofClass B. In rows 106 and 108, the process that is a member of Class B isrolled back because the priority value of Class B is less than that ofClass A which was incremented after the collision that occurred in row104. However, in row 110, the priority value associated with Class B isagain greater than the priority value associated with

Class A, and the process that is a member of Class A is rolled back. Asis apparent from a count of rollbacks in lines 92-104, process A “wins”5 times, while B wins twice.

It should be noted that this is directly proportional to their classbase priority values. Since this cycle repeats, each process class isensured an engineered share of processor time based on class basepriority.

FIG. 9B illustrates the method in accordance with the second embodimentof the invention when collisions occur in which process B is owner ofthe memory and process A is collider. Each of the processes belonging toClasses B and A are initialized to their inherited priority values of 2and 5, respectively. When an initial collision occurs, the process thatis a member of Class B is rolled back, as indicated at 134 of row 112.When a second collision occurs (row 114) the class pairs priority valueassociated with Class B has been incremented to 4. However, the processthat is a member of Class B is still rolled back because its priorityvalue is less than the process that is a member of Class A. Afterrollback, the priority value of Class B is incremented by its base valueto 6 (row 116). Consequently, when a collision occurs with a processthat is a member of Class A, the process that is the member of Class Ais rolled back because the priority value of Class A is less than thatof Class B. Since the process of Class A is rolled back at row 116, itspriority value is incremented by the class base priority value to 10(row 118). When a further collision occurs, the process of Class B istherefore rolled back. After rollback, the priority value of the processof Class B is increased to 8 (row 120), and when a collision occurs theprocess that is a member of Class B is again rolled back. The priorityof Class B is therefore incremented to 10. The priority values ofClasses A and B are therefore the same. When a collision occurs, theprocess that belongs to Class A is rolled back because the process ofClass B is owner of the memory space (row 122). When the process that isa member of Class A is rolled back, the priority value of Class A isincreased to 15 (row 124). Therefore, when a collision occurs, theprocess that belongs to Class B is rolled back. This occurs again in row126, even though the priority value of Class B has been increased by itsbase priority value to 12, and once more in row 128 even though theClass B priority value has been increased to 14. At row 130, however,the priority value of Class B is increased to 16 and the process that isa member of Class A is therefore rolled back. Finally, at row 132, thepriority value of Class A stands at 20, while the priority value ofClass B remains 16 and the process that is a member of Class B is rolledback. As is apparent from a count of rollbacks in FIGS. 9A and 9B, theproportion of rollbacks is the same.

The patterns shown in FIGS. 9A and 9B repeat cyclically as collisionsbetween members of the two classes occur, as is easily seen bysubtracting 10 from the process priority at row 106 (FIG. 9A) and fromthe class priority at row 126 (FIG. 9B). As explained above withreference to FIGS. 7 and 8, if a collision count of any given processexceeds the first engineered collision threshold, that process ispermitted to continue execution regardless of the priority valuesassociated with the colliding classes in the class pairs table, providedthat the process has a higher collision count 26 than the process withwhich it collided. Furthermore, if a particular process is rolled backenough times that its collision count 26 exceeds the second engineeredthreshold, then it is scheduled as the only process with that high acollision count to be permitted to execute. Thus, the process isguaranteed to win any collisions that may occur before it commits.

FIGS. 9A and 9B therefore illustrate that the method in accordance withthe second embodiment of the invention also ensure that processor timein the shared memory parallel processor computing system in accordancewith the invention is distributed among processes in accordance withengineered parameters. The invention therefore provides a flexible,dynamic method for process rollback in a shared memory/parallelprocessor computing system that ensures equitable access to computingresources by all process classes.

The embodiment(s) of the invention described above is(are) intended tobe exemplary only. The scope of the invention is therefore intended tobe limited solely by the scope of the appended claims.

I claim:
 1. A method for process rollback in a shared-memoryparallel-processor computing environment in which the parallelprocessors are operated concurrently and each processor sequentiallyruns processes, comprising steps of: a) when two processes compete for amemory space in the shared memory, rolling back one of: i) the processthat has a lower priority value; and ii) if the two processes have thesame priority value, the process that collided with an owner of thememory space; and b) computing a new priority value for the rolled-backprocess by incrementing the priority value by a predetermined amount,wherein if the two processes are members of different classes, thepredetermined amount is a base priority value assigned to a class ofwhich the rolled-back process is a member.
 2. A method as claimed inclaim 1 wherein if the two processes are members of the same class, thepredetermined amount is less than the priority value assigned to theclass.
 3. A method as claimed in claim 1 further comprising a step ofrestoring a priority value of the process to the base priority valueassigned to the class of which the process is a member each time theprocess runs to a commit point.
 4. A method as claimed in claim 3wherein the base priority value assigned to the class is related to aproportion of processor time allocated to the class.
 5. A method asclaimed in claim 4 wherein the base priority value is directlyproportional to the processor time allocated to the class.
 6. A methodas claimed in claim 1 wherein the priority value of the process isstored in a process control block associated with the process.
 7. Amethod as claimed in claim 1 wherein if the priority value of a processexceeds an engineered process priority threshold the process isscheduled and permitted to run without competition from another processhaving a same or higher priority value.
 8. A method for process rollbackin a shared-memory parallel-processor computing environment in which theprocessors run processes concurrently, and each process is a member ofone of a plurality of process classes, comprising steps of: a) providinga pair of variables for each pair of process classes, the pair ofvariables respectively storing a current priority value for each processclass in each process class pair; and b) when two processes that aremembers of different process classes compete for a same memory space,using the respective priority values for determining which process isrolled back.
 9. The method as claimed in claim 8 wherein using therespective priority values for determining which process is rolled backfurther comprises a step of comparing the respective priority values androlling back one of: a) the process that is a member of the class thathas a lower priority value; and b) if the two classes have the samepriority value, the process that collided with an owner of the memoryspace.
 10. The method as claimed in claim 9 further comprising a step ofstoring a new priority value in the variable for the class of which therolled-back process is a member.
 11. The method as claimed in claim 10wherein the step of storing a new priority value in the variablecomprises incrementing the priority value stored in the variable by anamount equal to a base priority value allocated to the class of whichthe process is a member.
 12. The method as claimed in claim 11 furthercomprising a step of providing a collision count associated with each ofthe processes.
 13. The method as claimed in claim 12 wherein thecollision count is stored in a process control block associated witheach of the respective processes.
 14. The method as claimed in claim 13further comprising a step of incrementing the collision count associatedwith a process each time the process is rolled back.
 15. The method asclaimed in claim 14 wherein when two processes compete for the samememory space and a collision count associated with either processexceeds a first engineered collision threshold, the process with a lowercollision count is rolled back and the process with a higher collisioncount is permitted to continue execution.
 16. The method as claimed inclaim 8 wherein when two processes that are members of the same classcompete for the same memory space, the method further comprises stepsof: a) rolling back the process that collided with an owner of thememory space; and b) incrementing a collision count associated with therolled-back process.
 17. The method as claimed in claim 16 furthercomprising steps of: a) comparing a value of the collision count with asecond engineered collision threshold; and b) if the collision countexceeds the second engineered collision threshold, scheduling theprocess so that it will run to a commit point by scheduling the processas the only process with a collision count that is equal to, or greaterthan, the second engineered collision threshold.
 18. The method asclaimed in claim 17 further comprising a step of resetting the collisioncount to zero after the process runs to a commit point.
 19. A sharedmemory parallel processor system for executing processes concurrently,comprising: a) means for storing a priority value associated with eachprocess; b) means for determining which one of two processes is to berolled back using the priority values associated with each of the twoprocesses when the two processes compete for a memory space; and c)means for computing a new priority value for the process that is rolledback by incrementing the priority value by a predetermined amount, thepredetermined amount being a first amount if the processes belong todifferent classes, and a second amount if the processes belong to thesame class.
 20. The system as claimed in claim 19 wherein the firstamount is a base priority value associated with a process class to whichthe process belongs.
 21. The system as claimed in claim 20 comprisingmeans for allocating the base priority value in proportion to engineeredparameter values.
 22. The system as claimed in claim 19 wherein themeans for determining which process is to be rolled back, comprises: a)means for selecting the process that has a lower priority value, whenthe two processes have different priori values; and b) means forselecting the process that collided with an owner of the memory space,when the two processes have the same priority value.
 23. The system asclaimed in claim 19 further comprising means for scheduling the processand permitting the process to run to a commit point if the priorityvalue of the process exceeds an engineered process priority threshold.24. The system as claimed in claim 19 further comprising means fordetermining a class to which each process belongs.
 25. A shared-memoryparallel-processor computing system in which the processors runprocesses concurrently, and each process is a member of one of aplurality of process classes, the apparatus comprising: a) means forstoring a pair of variables for each pair of process classes, thevariables storing a variable priority value for each process class ineach process class pair; and b) means for determining, using therespective priority values, which process is rolled back when twoprocesses that are members of different process classes compete for amemory space.
 26. The system as claimed in claim 25 wherein the meansfor determining which process is rolled back selects one of: a) theprocess that is a member of the class in the pair that has a lowerpriority value; and b) if the processes are members of a class pair thathave the same priority value, the process that collided with an owner ofthe memory space.
 27. The system as claimed in claim 26 furthercomprising means for storing a new priority value in the variable forthe class of the class pair of which the rolled-back process is amember.
 28. The system as claimed in claim 27 wherein the means forstoring a new priority value in the variable adds, to the value of thevariable, an amount equal to a base priority value.
 29. The system asclaimed in claim 28 further comprising means for computing and storing acollision count associated with each of the processes.
 30. The system asclaimed in claim 29 further comprising means for storing a firstcollision count threshold and a second collision count threshold. 31.The system as claimed in claim 29 wherein the means for computing andstoring the collision count stores the collision count in a processcontrol block associated with each of the respective processes.
 32. Thesystem as claimed in claim 31 wherein the system increments thecollision count associated with the rolled-back process.
 33. The systemas claimed in claim 32 further comprising means for rolling back aprocess with a lower collision count when two processes compete for amemory space and the collision count of either process exceeds the firstcollision count threshold.
 34. The system as claimed in claim 29 whereinthe system resets the collision count to zero when a process has run toa commit point.
 35. The system as claimed in claim 33 wherein when twoprocesses that are members of the same class compete for a memory space,the apparatus rolls back the process that collided with an owner of thememory space, and increments the collision count associated with therolled back process.
 36. The system as claimed in claim 35 furthercomprising means for comparing a value of the collision count with thesecond engineered collision threshold, and if the collision countexceeds the second engineered collision threshold, scheduling theprocess to run without competition.
 37. The system as claimed in claim36 wherein the system schedules the process to run to a commit point byscheduling the process so that it does not run concurrently with anotherprocess having a collision count that is equal to, or greater than, thesecond engineered collision threshold.